home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ADA Programming Guide
/
ADA Programming Guide.iso
/
ada_lrm
/
chap02.doc
< prev
next >
Wrap
Text File
|
1996-01-30
|
26KB
|
1,322 lines
The following document is a draft of the corresponding chapter of the
version of the Ada Reference Manual produced in response to the Ansi
Canvass. It is given a limited circulation to Ada implementers and to
other groups contributing comments (according to the conventions defined in
RRM.comments). This draft should not be referred to in any publication.
ANSI-RM-02-v23 - Draft Chapter
2 Lexical Elements
version 23
83-02-11
This revision has addressed all comments up to #5795
2. Lexical Elements
The text of a program consists of the texts of one or more compilations.
The text of a compilation is a sequence of lexical elements, each composed
of characters; the rules of composition are given in this chapter.
Pragmas, which provide certain information for the compiler, are also
described in this chapter.
References: character 2.1, compilation 10.1, lexical element 2.2, pragma
2.8
2.1 Character Set
The only characters allowed in the text of a program are the graphic
characters and format effectors. Each graphic character corresponds to a
unique code of the ISO seven-bit coded character set (ISO standard 646),
and is represented (visually) by a graphical symbol. Some graphic
characters are represented by different graphical symbols in alternative
national representations of the ISO character set. The description of the
language definition in this standard reference manual uses the ASCII
graphical symbols, the ANSI graphical representation of the ISO character
set.
graphic_character ::= basic_graphic_character
| lower_case_letter | other_special_character
basic_graphic_character ::=
upper_case_letter | digit
| special_character | space_character
basic_character ::=
basic_graphic_character | format_effector
The basic character set is sufficient for writing any program. The
characters included in each of the categories of basic graphic characters
are defined as follows:
(a) upper case letters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
(b) digits
0 1 2 3 4 5 6 7 8 9
2 - 1
(c) special characters
" # & ' ( ) * + , - . / : ; < = > _ |
(d) the space character
Format effectors are the ISO (and ASCII) characters called horizontal
tabulation, vertical tabulation, carriage return, line feed, and form feed.
2 - 2
The characters included in each of the remaining categories of graphic
characters are defined as follows:
(e) lower case letters
a b c d e f g h i j k l m n o p q r s t u v w x y z
(f) other special characters
! $ % ? @ [ \ ] ^ ` { }
Allowable replacements for the special characters vertical bar (|), sharp
(#), and quotation (") are defined in section 2.10.
Notes:
The ISO character that corresponds to the sharp graphical symbol in the
ASCII representation appears as a pound sterling symbol in the French,
German, and United Kingdom standard national representations. In any case,
the font design of graphical symbols (for example, whether they are in
italic or bold typeface) is not part of the ISO standard.
The meanings of the acronyms used in this section are as follows: ANSI
stands for American National Standards Institute, ASCII stands for American
Standard Code for Information Interchange, and ISO stands for International
Organization for Standardization.
The following names are used when referring to special characters and other
special characters:
symbol name symbol name
" quotation > greater than
# sharp _ underline
& ampersand | vertical bar
' apostrophe ! exclamation mark
( left parenthesis $ dollar
) right parenthesis % percent
* star, multiply ? question mark
+ plus @ commercial at
, comma [ left square bracket
- hyphen, minus \ back-slash
. dot, point, period ] right square bracket
/ slash, divide ^ circumflex
: colon ` grave accent
; semicolon { left brace
< less than } right brace
= equal ` tilde
2.2 Lexical Elements, Separators, and Delimiters
The text of a program consists of the texts of one or more compilations.
The text of each compilation is a sequence of separate lexical elements.
2 - 3
Each lexical element is either a delimiter, an identifier (which may be a
reserved word), a numeric literal, a character literal, a string literal,
or a comment. The effect of a program depends only on the particular
sequences of lexical elements that form its compilations, excluding the
comments, if any.
2 - 4
In some cases an explicit separator is required to separate adjacent
lexical elements (namely, when without separation, interpretation as a
single lexical element is possible). A separator is any of a space
character, a format effector, or the end of a line. A space character is a
separator except within a comment, a string literal, or a space character
literal. Format effectors other than horizontal tabulation are always
separators. Horizontal tabulation is a separator except within a comment.
The end of a line is always a separator. The language does not define what
causes the end of a line. However if, for a given implementation, the end
of a line is signified by one or more characters, then these characters
must be format effectors other than horizontal tabulation. In any case, a
sequence of one or more format effectors other than horizontal tabulation
must cause at least one end of line.
One or more separators are allowed between any two adjacent lexical
elements, before the first of each compilation, or after the last. At
least one separator is required between an identifier or a numeric literal
and an adjacent identifier or numeric literal.
A delimiter is either one of the following special characters (in the basic
character set)
& ' ( ) * + , - . / : ; < = > |
or one of the following compound delimiters each composed of two adjacent
special characters
=> .. ** := /= >= <= << >> <>
Each of the special characters listed for single character delimiters is a
single delimiter except if this character is used as a character of a
compound delimiter, or as a character of a comment, string literal,
character literal, or numeric literal.
The remaining forms of lexical element are described in other sections of
this chapter.
Notes:
Each lexical element must fit on one line, since the end of a line is a
separator. The quotation, sharp, and underline characters, likewise two
adjacent hyphens, are not delimiters, but may form part of other lexical
elements.
The following names are used when referring to compound delimiters:
delimiter name
=> arrow
.. double dot
** double star, exponentiate
:= assignment (pronounced: "becomes")
/= inequality (pronounced: "not equal")
>= greater than or equal
2 - 5
<= less than or equal
<< left label bracket
>> right label bracket
<> box
References: character literal 2.5, comment 2.7, compilation 10.1, format
effector 2.1, identifier 2.3, numeric literal 2.4, reserved word 2.9, space
character 2.1, special character 2.1, string literal 2.6
2 - 6
2.3 Identifiers
Identifiers are used as names and also as reserved words.
identifier ::=
letter {[underline] letter_or_digit}
letter_or_digit ::= letter | digit
letter ::= upper_case_letter | lower_case_letter
All characters of an identifier are significant, including any underline
character inserted between a letter or digit and an adjacent letter or
digit. Identifiers differing only in the use of corresponding upper and
lower case letters are considered as the same.
Examples:
COUNT X get_symbol Ethelyn Marion
SNOBOL_4 X1 PageCount STORE_NEXT_ITEM
Note:
No space is allowed within an identifier since a space is a separator.
References: digit 2.1, lower case letter 2.1, name 4.1, reserved word 2.9,
separator 2.2, space character 2.1, upper case letter 2.1
2.4 Numeric Literals
There are two classes of numeric literals: real literals and integer
literals. A real literal is a numeric literal that includes a point; an
integer literal is a numeric literal without a point. Real literals are
the literals of the type universal_real. Integer literals are the literals
of the type universal_integer.
numeric_literal ::= decimal_literal | based_literal
References: literal 4.2, universal_integer type 3.5.4, universal_real type
3.5.6
2.4.1 Decimal Literals
2 - 7
A decimal literal is a numeric literal expressed in the conventional
decimal notation (that is, the base is implicitly ten).
decimal_literal ::= integer [.integer] [exponent]
integer ::= digit {[underline] digit}
exponent ::= E [+] integer | E - integer
2 - 8
An underline character inserted between adjacent digits of a decimal
literal does not affect the value of this numeric literal. The letter E of
the exponent, if any, can be written either in lower case or in upper case,
with the same meaning.
An exponent indicates the power of ten by which the value of the decimal
literal without the exponent is to be multiplied to obtain the value of the
decimal literal with the exponent. An exponent for an integer literal must
not have a minus sign.
Examples:
12 0 1E6 123_456 -- integer literals
12.0 0.0 0.456 3.14159_26 -- real literals
1.34E-12 1.0E+6 -- real literals with exponent
Notes:
Leading zeros are allowed. No space is allowed in a numeric literal, not
even between constituents of the exponent, since a space is a separator. A
zero exponent is allowed for an integer literal.
References: digit 2.1, lower case letter 2.1, numeric literal 2.4,
separator 2.2, space character 2.1, upper case letter 2.1
2.4.2 Based Literals
A based literal is a numeric literal expressed in a form that specifies the
base explicitly. The base must be at least two and at most sixteen.
based_literal ::=
base # based_integer [.based_integer] # [exponent]
base ::= integer
based_integer ::=
extended_digit {[underline] extended_digit}
extended_digit ::= digit | letter
An underline character inserted between adjacent digits of a based literal
does not affect the value of this numeric literal. The base and the
exponent, if any, are in decimal notation. The only letters allowed as
extended digits are the letters A through F for the digits ten through
fifteen. A letter in a based literal (either an extended digit or the
letter E of an exponent) can be written either in lower case or in upper
case, with the same meaning.
2 - 9
The conventional meaning of based notation is assumed; in particular the
value of each extended digit of a based literal must be less than the base.
An exponent indicates the power of the base by which the value of the based
literal without the exponent is to be multiplied to obtain the value of the
based literal with the exponent.
2 - 10
Examples:
2#1111_1111# 16#FF# 016#0FF# -- integer literals of value 255
16#E#E1 2#1110_0000# -- integer literals of value 224
16#F.FF#E+2 2#1.1111_1111_111#E11 -- real literals of value 4095.0
References: digit 2.1, exponent 2.4.1, letter 2.3, lower case letter 2.1,
numeric literal 2.4, upper case letter 2.1
2.5 Character Literals
A character literal is formed by enclosing one of the 95 graphic
characters (including the space) between two apostrophe characters. A
character literal has a value that belongs to a character type.
character_literal ::= 'graphic_character'
Examples:
'A' '*' ''' ' '
References: character type 3.5.2, graphic character 2.1, literal 4.2,
space character 2.1
2.6 String Literals
A string literal is formed by a sequence of graphic characters (possibly
none) enclosed between two quotation characters used as string brackets.
string_literal ::= "{graphic_character}"
A string literal has a value that is a sequence of character values
corresponding to the graphic characters of the string literal apart from
the quotation character itself. If a quotation character value is to be
represented in the sequence of character values, then a pair of adjacent
quotation characters must be written at the corresponding place within the
string literal. (This means that a string literal that includes two
adjacent quotation characters is never interpreted as two adjacent string
literals.)
The length of a string literal is the number of character values in the
sequence represented. (Each doubled quotation character is counted as a
single character.)
Examples:
2 - 11
"Message of the day:"
"" -- an empty string literal
" " "A" """" -- three string literals of length 1
"Characters such as $, %, and } are allowed in string literals"
2 - 12
Note:
A string literal must fit on one line since it is a lexical element (see
2.2). Longer sequences of graphic character values can be obtained by
catenation of string literals. Similarly catenation of constants declared
in the package ASCII can be used to obtain sequences of character values
that include nongraphic character values (the so-called control
characters). Examples of such uses of catenation are given below:
"FIRST PART OF A SEQUENCE OF CHARACTERS " &
"THAT CONTINUES ON THE NEXT LINE"
"sequence that includes the" & ASCII.ACK & "control character"
References: ascii predefined package C, catenation operation 4.5.3,
character value 3.5.2, constant 3.2.1, declaration 3.1, end of a line 2.2,
graphic character 2.1, lexical element 2.2
2.7 Comments
A comment starts with two adjacent hyphens and extends up to the end of the
line. A comment can appear on any line of a program. The presence or
absence of comments has no influence on whether a program is legal or
illegal. Furthermore, comments do not influence the effect of a program;
their sole purpose is the enlightenment of the human reader.
Examples:
-- the last sentence above echoes the Algol 68 report
end; -- processing of LINE is complete
-- a long comment may be split onto
-- two or more consecutive lines
---------------- the first two hyphens start the comment
Note:
Horizontal tabulation can be used in comments, after the double hyphen,
and is equivalent to one or more spaces (see 2.2).
References: end of a line 2.2, illegal 1.6, legal 1.6, space character 2.1
2.8 Pragmas
2 - 13
A pragma is used to convey information to the compiler. A pragma starts
with the reserved word pragma followed by an identifier that is the name of
the pragma.
pragma ::=
pragma identifier [(argument_association {, argument_association})];
argument_association ::=
[argument_identifier =>] name
| [argument_identifier =>] expression
2 - 14
Pragmas are only allowed at the following places in a program:
- After a semicolon delimiter, but not within a formal part or
discriminant part.
- At any place where the syntax rules allow a construct defined by a
syntactic category whose name ends with "declaration", "statement",
"clause", or "alternative", or one of the syntactic categories variant
and exception handler; but not in place of such a construct. Also at
any place where a compilation unit would be allowed.
Additional restrictions exist for the placement of specific pragmas.
Some pragmas have arguments. Argument associations can be either
positional or named as for parameter associations of subprogram calls (see
6.4). Named associations are, however, only possible if the argument
identifiers are defined. A name given in an argument must be either a name
visible at the place of the pragma or an identifier specific to the pragma.
The pragmas defined by the language are described in Annex B: they must be
supported by every implementation. In addition, an implementation may
provide implementation-defined pragmas, which must then be described in
Appendix F. An implementation is not allowed to define pragmas whose
presence or absence influences the legality of the text outside such
pragmas. Consequently, the legality of a program does not depend on the
presence or absence of implementation-defined pragmas.
A pragma that is not language-defined has no effect if its identifier is
not recognized by the (current) implementation. Furthermore, a pragma
(whether language-defined or implementation-defined) has no effect if its
placement or its arguments do not correspond to what is allowed for the
pragma. The region of text over which a pragma has an effect depends on
the pragma.
Examples:
pragma LIST(OFF);
pragma OPTIMIZE(TIME);
pragma INLINE(SETMASK);
pragma SUPPRESS(RANGE_CHECK, ON => INDEX);
Note:
It is recommended (but not required) that implementations issue warnings
for pragmas that are not recognized and therefore ignored.
References: compilation unit 10.1, delimiter 2.2, discriminant part 3.7.1,
exception handler 11.2, expression 4.4, formal part 6.1, identifier 2.3,
implementation-defined pragma F, language-defined pragma B, legal 1.6, name
4.1, reserved word 2.9, statement 5, static expression 4.9, variant 3.7.3,
visibility 8.3
Categories ending with "declaration" comprise: basic declaration 3.1,
component declaration 3.7, entry declaration 9.5, generic parameter
declaration 12.1
2 - 15
Categories ending with "clause" comprise: alignment clause 13.4, component
clause 13.4, context clause 10.1.1, representation clause 13.1, use clause
8.4, with clause 10.1.1
Categories ending with "alternative" comprise: accept alternative 9.7.1,
case statement alternative 5.4, delay alternative 9.7.1, select alternative
9.7.1, selective wait alternative 9.7.1, terminate alternative 9.7.1
2 - 16
2.9 Reserved Words
The identifiers listed below are called reserved words and are reserved for
special significance in the language. For readability of this manual, the
reserved words appear in lower case boldface.
abort declare generic of select
abs delay goto or separate
accept delta others subtype
access digits if out
all do in task
and is package terminate
array pragma then
at else private type
elsif limited procedure
end loop
begin entry raise use
body exception range
exit mod record when
rem while
new renames with
case for not return
constant function null reverse xor
A reserved word must not be used as a declared identifier.
Notes:
Reserved words differing only in the use of corresponding upper and lower
case letters are considered as the same (see 2.3). In some attributes the
identifier that appears after the apostrophe is identical to some reserved
word.
References: attribute 4.1.4, declaration 3.1, identifier 2.3, lower case
letter 2.1, upper case letter 2.1
2.10 Allowable Replacements of Characters
The following replacements are allowed for the vertical bar, sharp, and
quotation basic characters:
- A vertical bar character (|) can be replaced by an exclamation mark
(!) where used as a delimiter.
- The sharp characters (#) of a based literal can be replaced by colons
(:) provided that the replacement is done for both occurrences.
- The quotation characters (") used as string brackets at both ends of a
string literal can be replaced by percent characters (%) provided that
2 - 17
the enclosed sequence of characters contains no quotation character,
and provided that both string brackets are replaced. Any percent
character within the sequence of characters must then be doubled and
each such doubled percent character is interpreted as a single percent
character value.
2 - 18
These replacements do not change the meaning of the program.
Notes:
It is recommended that use of the replacements for the vertical bar, sharp,
and quotation characters be restricted to cases where the corresponding
graphical symbols are not available. Note that the vertical bar appears as
a broken bar on some equipment; replacement is not recommended in this
case.
The rules given for identifiers and numeric literals are such that lower
case and upper case letters can be used indifferently; these lexical
elements can thus be written using only characters of the basic character
set. If a string literal of the predefined type STRING contains characters
that are not in the basic character set, the same sequence of character
values can be obtained by catenating string literals that contain only
characters of the basic character set with suitable character constants
declared in the predefined package ASCII. Thus the string literal "AB$CD"
could be replaced by "AB" & ASCII.DOLLAR & "CD". Similarly, the string
literal "ABcd" with lower case letters could be replaced by "AB" &
ASCII.LC_C & ASCII.LC_D.
References: ascii predefined package C, based literal 2.4.2, basic
character 2.1, catenation operation 4.5.3, character value 3.5.2, delimiter
2.2, graphic character 2.1, graphical symbol 2.1, identifier 2.3, lexical
element 2.2, lower case letter 2.1, numeric literal 2.4, string bracket
2.6, string literal 2.6, upper case letter 2.1
2 - 19